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1 Desjgn. and.tiata^ 
Maria-Cecilia Rivara 

August 1984 ACM Transactions on Mathematical Software (TOMS), Volume 10 issue 3 

Full text available: ^ sxiffl 40 MB: Additional Information: full citation , raferancras. cltiri-as. index tern^i . review 



2 .SuperLU_DiSI:..A.sra!ab 
Xiaoye S. Li, James W. Demmel 

June 2003 ACM Transactions on Mathematical Software (TOMS), Volume 29 issue 2 

Additional Information: MUiLajjon, abstract reMs&c?:!?., c^ifioi?., index 
terms 



Full text available: mp#.65.9,03.KB: 



We present the main algorithmic features in the software package SuperLU_DIST, a 
distributed-memory sparse direct solver for large sets of linear equations. We give in detail 
our parallelization strategies, with a focus on scalability issues, and demonstrate the 
software's parallel performance and scalability on current machines. The solver is based on 
sparse Gaussian elimination, with an innovative static pivoting strategy proposed earlier by 
the authors. The main advantage of static pivoting o ... 

Keywords: Sparse direct solver, distributed-memory computers, parallelism, scalability, 
supernodal factorization 



3 Automatic test data generation using constraint solving techniques 
Arnaud Gotlieb, Bernard Botella, Michel Rueher 

March 1998 ACM SIGSOFT Software Engineering Notes , Proceedings of the 1998 ACM 
SIGSOFT international symposium on Software testing and analysis, volume 

23 Issue 2 

Full text available: -ffi Dd«841.83 KB) Addltional ,nformation: ^tiQn . ^*es, 

Automatic test data generation leads to identify input values on which a selected point in a 
procedure is executed. This paper introduces a new method for this problem based on 
constraint solving techniques. First, we statically transform a procedure into a constraint 
system by using well-known "Static Single Assignment" form and control-dependencies. 
Second, we solve this system to check whether at least one feasible control flow path going 
through the selected point exists and to generate test ... 
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Keywords: automatic test data generation, constraint solving techniques, global 
constraints, structural testing 



4 Type system 

Ben Lib! it, Alexander Aiken 

January 2000 Proceedings of the 27th ACM SIGPLAN-SIGACT symposium on Principles 
of programming languages 

Additional Information: fall citation , abstract, references , chinas , index 



Full text available: .64 MB} 

^ " terms 

Distributed-memory programs are often written using a global address space: any process 
can name any memory location on any processor. Some languages completely hide the 
distinction between local and remote memory, simplifying the programming model at some 
performance cost. Other languages give the programmer more explicit control, offering 
better potential performance but sacrificing both soundness and ease of use. Through a 
series of progressively richer type systems, we formal ... 

5 PYTHS/VH: a knowledge/database system for managing performance data and 
recommending scientific software 

Elias N. Houstis, Ann C. Catlin, John R. Rice, Vassilios S. Verykios, Naren Ramakrishnan, 
Catherine E. Houstis 

June 2000 ACM Transactions on Mathematical Software (TOMS), volume 26 issue 2 

Full text available- W\ g tffc79S.16 KB; Additional lnformatlon: M.sMiM. attract, references, citings, jnde* 

Often scientists need to locate appropriate software for their problems and then select from 
among many alternatives. We have previously proposed an approach for dealing with this 
task by processing performance data of the targeted software. This approach has been 
tested using a customized implementation referred to as PYTHIA. This experience made us 
realize the complexity of the algorithmic discovery of knowledge from performance data and 
of the management of these data together with the d ... 

Keywords: data mining, inductive logic programming, knowledge discovery in databases, 
knowledge-based systems, performance evaluation, recommender systems, scientific 
software 



6 MyitLRelatjon^ 

Hendrik Blockeel, Michele Sebag 

July 2003 ACM SIGKDD Explorations Newsletter, volume 5 issue l 

Full text available: ^ cdffl.61 MB) Additional Information: fuii ci-ation , abstract , references , citings 

Efficiency and Scalability have always been important concerns in the field of data mining, 
and are even more so in the multi-relational context, which is inherently more complex. The 
issue has been receiving an increasing amount of attention during the last few years, and 
quite a number of theoretical results, algorithms and implementations have been presented 
that explicitly aim at improving the efficiency and Scalability of multi-relational data mining 
approaches. With this article we attempt ... 
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May 1988 ACM SIGARCH Computer Architecture News , Proceedings of the 15th 

Annual International Symposium on Computer architecture, Volume 16 issue 2 
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Full text available: ^^df(858,9<J<Bj Additional Information: MLdM&Q. s^strsct, .references, citings, Index 

Partial differential equations can be found in a host of engineering and scientific problems. 
The emergence of new parallel architectures has spurred research in the definition of parallel 
PDE solvers. Concurrently, highly programmable systems such as data-flow architectures 
have been proposed for the exploitation of large scale parallelism. The implementation of 
some Partial Differential Equation solvers (such as the Jacobi method) on a tagged token 
data-flow graph is demonstrated here. As ... 

8 Supporting..^ 

C. Koelbel, P. Mehrotra, j. Van Rosendale 

February 1990 ACM SIGPLAN Notices , Proceedings of the second ACM SIGPLAN 

symposium on Principles & practice of parallel programming, volume 25 

Issue 3 

Additional Information: foil citation , abstract. YtiUmwom . cjtincu>, index 



Full text available: Mp^ 1.14 MB) 

^ terms 

Programming nonshared memory systems is more difficult than programming shared 
memory systems, since there is no support for shared data structures. Current programming 
languages for distributed memory architectures force the user to decompose all data 
structures into separate pieces, with each piece "owned" by one of the processors in the 
machine, and with all communication explicitly specified by low-level message-passing 
primitives. This paper presents a new programming envir ... 

9 Climate data assimilation on a massively parallel Supercomputer 
Hong Q. Ding, Robert D. Ferraro 

November 1996 Proceedings of the 1996 ACM/IEEE conference on Supercomputing 
(CDROM) 

r- .i* ^ , a ,„,,, /£)y ,. Additional Information: tuii. citation., sbsirsct, reMencM, c^fn^., jndex 
Full text available: * m -x -ft : 62.45 Kb) 

LJ " , terms 

We have designed and implemented a set of highly efficient and highly scalable algorithms 
for an unstructured computational package, the PSAS data assimilation package, as 
demonstrated by detailed performance analysis of systematic runs on up to 512-nodes of an 
Intel Paragon. The preconditioned Conjugate Gradient solver achieves a sustained 18 Gflops 
performance. Consequently, we achieve an unprecedented 100-fold reduction in time to 
solution on the Intel Paragon over a single head of a Cra ... 

10 A data-lM 

flows 

Mustafa Q. Pinar, Stavros A. Zenios 

December 1994 ACM Transactions on Mathematical Software (TOMS), volume 20 issue 4 
Full text available: ^jxigl Additional Information: MLciLaJiM, £fc^££t lUM^^j?, index teriris 

We describe the development of a data-level, massively parallel, software system for the 
solution of multicommodity network flow problems. Using a smooth linear-quadratic penalty 
(LQP) algorithm we transform the multicommodity network flow problem into a sequence of 
independent min-cost network flow subproblems. The solution of these problems is 
coordinated via a simple, dense, nonlinear master program to obtain a solution that is 
feasible within some user-specified tolerance to the origina ... 

Keywords: massively parallel algorithms, multicommodity network problems, parallel 
optimization 
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M. E. D'lmperio 

December 1969 ACM SIGMOD Record. Volume 1 Issue 2 

Full text available: ^ixi£lo6.MB;. Additional Information: Ml.?i.tatjon, refeiences 



1 2 InfoniiMj^ 

M. E. D'lmperio 

December 1969 ACM SIGMOD Record, volume l issue 2 

Full text available: *f f) pdf(1.56 MB; Additional Information: foil citation , references 



13 A|gorjthm.817.P2MESH^^ 

meshes and FEM/FVM-based PDE solvers 
Enrico Bertolazzi, Gianmarco Manzini 

March 2002 ACM Transactions on Mathematical Software (TOMS), Volume 28 issue i 

Additional Information: fyli.citatjon, abstract, Merences, cltinas, index 



Full text available: mcdfi259. OA KB) 

terms 

The software interface P2MESH is a collection of C++ class templates suitable for developing 
prototypes of high-performance PDE solvers on unstructured 2-D meshes. P2MESH supports 
several discretization methods on triangles and quadrilaterals, such as finite volume or finite 
element. The design philosophy of P2MESH does not consider specific model problems or 
built-in approximation algorithms. The software package is general purpose and it may also 
be used as a building block in the implementati ... 

Keywords: Finite Element, Finite Volume, Object-Oriented programming, PDE solvers, 
unstructured mesh 



1 4 An jn^mdueto^ 

J. P. Tremblay, P. G. Sorenson 

September 1975 ACM SIGCSE Bulletin, Volume 7 issue 3 

Full text available: KB}. Additional Information: feil.oftati.on, abstrsct, citings, indexMi".Ql?. 

This paper describes a two semester introductory course in data (information) structures for 
the undergraduate computer science student that has evolved at the University of 
Saskatchewan, Saskatoon. The philosophy and organization of such a course are discussed. 
A comparison is made between the course described and data structure courses proposed by 
two commitees 'on curricula. 

15 Programming data parallel algorithms on distributed memory using Kail 
Charles Koelbel, Piyush Mehrotra 

June 1991 Proceedings of the 5th international conference on Supercomputing 

Full text available: Spdf{93Z^29.KBj Additional Information: Mi .citation, references, citings, index tex-T-s 




16 The design, jm^ 

Anshul Gupta, Fred G. Gustavson, Mahesh Joshi, Sivan Toledo 

March 1998 ACM Transactions on Mathematical Software (TOMS), volume 24 issue l 

Additional Information: full citation , abstract references , citings , index 



Full text available: 

terms, review 



http://portal.acm.org/results.cfm?coll=ACM&dl=ACM&CFID=403 1941 1&CFT0KEN=93 1 . 



3/18/05 



Results (page 1): solver process and data stores and data structures 



Page 5 of 6 



This article describes the design, implementation, and evaluation of a parallel algorithm for 
the Cholesky factorization of symmetric banded matrices. The algorithm is part of IBM's 
parallel engineering and scientific subroutine library version 1.2 and is compatible with 
ScaLAPACK's banded solver. Analysis, as well as experiments on an IBM SP2 distributed- 
memory parallel computer, shows that the algorithm efficiently factors banded matrices with 
wide bandwidth. For example, a 31-mode SP2 fa ... 

Keywords: Banded matrices, Cholesky factorization, distributed memory, parallel algorithm 



17 GompresMQD..Pf.partjcfe 

Dow-Yung Yang, Ananth Grama, Vivek Sarin, Naren Ramakrishnan 

September 2001 ACM Transactions on Mathematical Software (TOMS), volume 27 issue 3 

Full text available: ^ p<jf(6i4.22 KB) Additional Information: full cfoation . abstract reference , index terras 

This article presents an analytical and computational framework for the compression of 
particle data resulting from hierarchical approximate treecodes such as the Barnes—Hut and 
Fast Multipole Methods. Due to approximations introduced by hierarchical methods, various 
parameters (such as position, velocity, acceleration, potential) associated with a particle can 
be bounded by distortion radii. Using this distortion radii, we develop storage schemes that 
guarantee error bounds while ... 

Keywords: Astrophysics, Barnes—Hut, Fast Multipole Method, data compression and 
analysis, materials simulation, molecular dynamics, particle dynamics 



18 A.paraiiel.direcLsoiver 
Iain S. Duff, Jennifer A. Scott 

June 2004 ACM Transactions on Mathematical Software (TOMS), volume 30 issue 2 

Additional Information: Ml?iMtv>n, sbstraot., references, index.terrrvs, 



Full text available: W pi2f(23243 KB) 

The need to solve large sparse linear systems of equations efficiently lies at the heart of 
many applications in computational science and engineering. For very large systems when 
using direct factorization methods of solution, it can be beneficial and sometimes necessary 
to use multiple processors, because of increased memory availability as well as reduced 
factorization time. We report on the development of a new parallel code that is designed to 
solve linear systems with a highly unsymmetric ... 

Keywords: Gaussian elimination, Sparse matrices, highly unsymmetric linear systems, 
parallel processing 



19 Applications and problem solving environments: Roccom: an Object-oriented, data- 
centric „so^ 

Xiangmin Jiao, Michael T. Campbell, Michael T. Heath 

June 2003 Proceedings of the 17th annual international conference on 
Supercomputing 

Full text available: t j| | o-:jf(265.82 KB; Additional Information: (all citation, abslract . reifsrsnctts . index terms; 

We describe an object-oriented software integration framework, Roccom, abstracted from 
our five years of experience in developing a complex, integrated code for rocket simulation. 
Roccom provides a flexible mechanism for inter-module data exchange and function 
invocation in parallel multiphysics simulations. It is designed to minimize user effort and 
code changes for integration, facilitate interoperability between different programming 
languages (in particular, C++ and Fortran 90) ... 
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Keywords: middleware, multiphysics simulation, object-oriented design, problem solving 
environments, system integration 



20 P3ra!iei.executjon 

Gopai Gupta, Enrico Pontelli, Khayri A.M. AM, Mats Carlsson, Manuel V. Hermenegildo 
July 2001 ACM Transactions on Programming Languages and Systems (TOPLAS), 

Volume 23 Issue 4 

r- II * ^ -i ui w:*"! Mn> Additional Information: full citation, abstract, references, cltincs, index 
Full text available. TO fx-ft 1 &o MB; ^ 

terms 

Since the early days of logic programming, researchers in the field realized the potential for 
exploitation of parallelism present in the execution of logic programs. Their high-level 
nature, the presence of nondeterminism, and their referential transparency, among other 
characteristics, make logic programs interesting candidates for obtaining speedups through 
parallel execution. At the same time, the fact that the typical applications of logic 
programming frequently involve irregular computatio ... 

Keywords: Automatic parallelization, constraint programming, logic programming, 
parallelism, prolog 
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1 Efficient load baiancing and data remapping for adaptive grid calculations 
Leonid Oliker, Rupak Biswas 

June 1997 Proceedings of the ninth annual ACM symposium on Parallel algorithms and 
architectures 

Full text available: ^^.^csfi;|...3Q.MBj Additional Information: feil.citation, references., citings, inriex.terra 



E.§raj.!ej.execuMn.olpr 

Gopal Gupta, Enrico Ponteili, Khayri A.M. Ah, Mats Carlsson, Manuel V. Hermenegildo 
July 2001 ACM Transactions on Programming Languages and Systems (TOPLAS), 

Volume 23 Issue 4 

Additional Information: Ml citation, abstract, references, citings, .Index 



Full text available: mJ odft'1.95 MBj 

^ terms 

Since the early days of logic programming, researchers in the field realized the potential for 
exploitation of parallelism present in the execution of logic programs. Their high-level 
nature, the presence of nondeterminism, and their referential transparency, among other 
characteristics, make logic programs interesting candidates for obtaining speedups through 
parallel execution. At the same time, the fact that the typical applications of logic 
programming frequently involve irregular computatio ... 

Keywords: Automatic parallelization, constraint programming, logic programming, 
parallelism, prolog 



.E§M..det0ctjgn.oLcg 

Thomas Kunz, Michiel F. H. Seuren 

November 1997 Proceedings of the 1997 conference of the Centre for Advanced Studies 
on Collaborative research 

Full text available: ^ osM^A.MB) Additional Information: M.vAaiipn, sbstrsd, references, index terrns 

Understanding distributed applications is a tedious and difficult task. Visualizations based on 
process-time diagrams are often used to obtain a better understanding of the execution of 
the application. The visualization tool we use is Poet, an event tracer developed at the 
University of Waterloo. However, these diagrams are often very complex and do not provide 
the user with the desired overview of the application. In our experience, such tools display 
repeated occurrences of non-trivial commun ... 
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4 HPEBench^ 

Y. Charlie Hu, Guohua Jin, S. Lennart Johnsson, Dimitris Kehagias, Nadia Shalaby 
March 2000 ACM Transactions on Mathematical Software (TOMS), volume 26 issue l 

Full text available: ^jxjf(274 v 52 KB). Additional Information: feilLcltatjorL abstract referenoes, index terilii?. 

The high performance Fortran (HPF) benchmark suite HPFBench is designed for evaluating 
the HPF language and compilers on scalable architectures. The functionality of the 
benchmarks covers scientific software library functions and application kernels that reflect 
the computational structure and communication patterns in fluid dynamic simulations, 
fundamental physics, and molecular studies in chemistry and biology. The benchmarks are 
characterized in terms of FLOP count, memory usage, communi ... 

Keywords: benchmarks, compilers, high performance Fortran 




5 Experience 

Anita K. Jones, Peter Schwarz 

June 1980 ACM Computing Surveys (CSUR), volume 12 issue 2 

Full text available: ^pdf{4 48 MB) Additional Information: full clhation . references, cities , index ter^a 



6 Map.pjng..perfonna 

R. Bruce Irvin, Barton P. Miller 

January 1996 Proceedings of the 10th international conference on Supercomputing 

Full text available: ^.pcigg94 v 50. KB) Additional Information: M.cli^tetL raferynces, cllios.s., ind^.terrns 



7 Automatic generation of intelligent diagram editors 
Sitt Sen Chok, Kim Marriott 

September 2003 ACM Transactions on Computer-Human Interaction (TOCHI), volume 10 

Issue 3 

Full text available: *Q fn .43 MB) Additional Information: UA\ citation , abslrad. wiweix:^; , index terms; 

The intelligent diagram is a recent metaphor for diagramming in which the underlying 
graphic editor parses the diagram as it is being constructed, performing error correction and 
collecting geometric constraints that capture the relationships between diagram 
components. During diagram manipulation a constraint solver uses these geometric 
constraints to maintain the diagram's semantics. We introduce the Penguins system. This 
automates the development of graphical editors that support the i ... 

Keywords: Constraint multi-set grammars, constraint solving, diagram interaction, diagram 
parsing, intelligent diagram, pen-based computing 



8 The role 

A. Boiour, T. L. Anderson, L. J. Dekeyser, H. K. T. Wong 
April 1982 ACM SIGMOD Record, volume 12 issue 3 

Full text available: ^.^ffZl^.M.Bj Additional Information: MLciMlQ.0. references, cjtinejs 
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Henri E. Bal, Jennifer G. Steiner, Andrew S. Tanenbaum 

September 1989 ACM Computing Surveys (CSUR), Volume 21 issue 3 

Full text available: -R cdTO 50 MB; Additional ,nformation: MilfMon, ^iwces , diiffiS. i0** 

^ tenris, review 

When distributed systems first appeared, they were programmed in traditional sequential 
languages, usually with the addition of a few library procedures for sending and receiving 
messages. As distributed applications became more commonplace and more sophisticated, 
this ad hoc approach became less satisfactory. Researchers all over the world began 
designing new programming languages specifically for implementing distributed applications. 
These languages and their history, their underlying pr ... 

10 A framework for call graph construction algorithms 
David Grove, Craig Chambers 

November 2001 ACM Transactions on Programming Languages and Systems (TOPLAS), 

Volume 23 Issue 6 

r- Hi. ^ r u. cs* ^ 4 <~ ^r.x Additional Information: full citation, ssbs-irad, references, citings, Index 
Full text available: ra.p3fi.L&"a.MB.i. ; 



A large number of call graph construction algorithms for object-oriented and functional 
languages have been proposed, each embodying different tradeoffs between analysis cost 
and call graph precision. In this article we present a unifying framework for understanding 
call graph construction algorithms and an empirical comparison of a representative set of 
algorithms. We first present a general parameterized algorithm that encompasses many 
well-known and novel call graph construction algorithms. W ... 

Keywords: Call graph construction, control flow analysis, interprocedural analysis 



1 1 1gchn i caj. reports 
SIGACT News' Staff 

January 1980 ACM SIGACT News, Volume 12 issue l 

Full text available: ffipdfCS^S. M.B}. Additional Information: Ml cMion 



12 Software unit test coverage and adequacy 
Hong Zhu, Patrick A. V. Hall, John H. R. May 

December 1997 ACM Computing Surveys (CSUR), Volume 29 issue 4 

Additional Information: Ml .citation, abstract, references, citings, Index 



Full text available: Y® p£f(4?7.42 KB} 

^ "" terms , review 

Objective measurement of test quality is one of the key issues in software testing. It has 
been a major research focus for the last two decades. Many test criteria have been proposed 
and studied for this purpose. Various kinds of rationales have been presented in support of 
one criterion or another. We survey the research work in this area. The notion of adequacy 
criteria is examined together with its role in software dynamic testing. A review of criteria 
classification is followed by a sum ... 

Keywords: comparing testing effectiveness, fault detection, software unit test, test 
adequacy criteria, test coverage, testing methods 
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November 2003 Proceedings of the 1st international conference on Embedded 
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networked sensor systems 

Full text available: ^lxj£299.M.KBj Additional Information: Ml citation, sbst.r3.0t, references, citings 

Wireless sensor networks enable dense sensing of the environment, offering unprecedented 
opportunities for observing the physical world. Centralized data collection and analysis 
adversely impact sensor node lifetime. Previous sensor network research has, therefore, 
focused on in network aggregation and query processing, but has done so for applications 
where the features of interest are known a priori. When features are not known a priori, as 
is the case with many scientific applications in dens ... 

14 A comparison of three programming models for adaptive applications on the 

QSMD2Q00 

Hongzhang Shan, Jaswinder P. Singh, Leonid Oliker, Rupak Biswas 

November 2000 Proceedings of the 2000 ACM/IEEE conference on Supercomputing 
(CDROM) 

Full text available: ^Dd£m.30.K.Bj Additional Information: Ml sitation . abstract. f^re>x:o<; . cilinos . =ncU>x 

Adaptive applications have computational workloads and communication patterns which 
change unpredictably at runtime, requiring load balancing to achieve scalable performance 
on parallel machines. Efficient parallel implementation of such adaptive application is 
therefore a challenging task. In this paper, we compare the performance of and the 
programming effort required for two major classes of adaptive applications under three 
leading parallel programming models on an SGI Origin 2000 syste ... 

Ben Wegbreit 

September 1975 Communications of the ACM ; volume 18 issue 9 

Additional Information: Mi.cJ.tatj.on, abstract, .refer^n.c&s, citings, index 



Full text available: TO c-rfft '=.29 MB) 

^ MUM 

One means of analyzing program performance is by deriving closed-form expressions for 
their execution behavior. This paper discusses the mechanization of such analysis, and 
describes a system, Metric, which is able to analyze simple Lisp programs and produce, for 
example, closed-form expressions for their running time expressed in terms of size of input. 
This paper presents the reasons for mechanizing program analysis, describes the operation 
of Metric, explains its implementation, and disc ... 

Keywords: Lisp, algebraic manipulation, analysis of algorithms, analysis of programs, 
difference equations, execution behavior, execution time, generating functions, list 
processing, performance analysis, programming languages 



16 Data places 

George Copeland, William Alexander, Ellen Boughter, Tom Keller 

June 1988 ACM SIGMOD Record , Proceedings of the 1988 ACM SIGMOD international 
conference on Management of data, volume 17 issue 3 

Additional Information: MLoiMte.Cl, .?L»sir.&v.L, f^lMHKes, ciljngs, index 



Full text available: TO cdfil. 41 MB) 

This paper examines the problem of data placement in Bubba, a highly-parallel system for 
data-intensive applications being developed at MCC. "Highly-parallel" implies that load 
balancing is a critical performance issue. "Data-intensive" means data is so large that 
operations should be executed where the data resides. As a result, data placement becomes 
a critical performance issue. In general, determining the optimal placement of d ... 

17 
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Dong H. Ahn, Jeffrey S. Vetter 

November 2002 Proceedings of the 2002 ACM/IEEE conference on Supercomputing 

Full text available: ^jxjgljjo MB.-. Additional Information: fujLcliayon, abL5tr5:ct r reM^ces, index. 

Contemporary microprocessors provide a rich set of integrated performance counters that 
allow application developers and system architects alike the opportunity to gather important 
information about workload behaviors. Current techniques for analyzing data produced from 
these counters use raw counts, ratios, and visualization techniques help users make 
decisions about their application performance. While these techniques are appropriate for 
analyzing data from one process, they do not scale easi ... 

18 Analysis and compa^ 
computers 

Patrick R. Amestoy, Iain S. Duff, Jean-Yves L'excellent, Xiaoye S. Li 

December 2001 ACM Transactions on Mathematical Software (TOMS), volume 27 issue 4 

Additional Information: Ml^Mion, extract, .{^^f^ixc;^, citings, ;r;de?< 



Full text available: Wl odf£1.G3 MB) 

' " terms, review 

This paper provides a comprehensive study and comparison of two state-of-the-art direct 
solvers for large sparse sets of linear equations on large-scale distributed-memory 
computers. One is a multifrontal solver called MUMPS, the other is a supernodal solver called 
superLU. We describe the main algorithmic features of the two solvers and compare their 
performance characteristics with respect to uniprocessor speed, interprocessor 
communication, and memory requirements. For both solvers, preorderi ... 

Keywords: Sparse direct solvers, distributed-memory computers, multifrontal and 
supernodal factorizations, parallelism 
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This paper presents the design and implementation of the Intentional Naming System (INS), 
a resource discovery and service location system for dynamic and mobile networks of 
devices and computers. Such environments require a naming system that is (i) expressive, 
to describe and make requests based on specific properties of services, (ii) responsive, to 
track changes due to mobility and performance, (Hi) robust, to handle failures, and (iv) 
easily configurable. INS uses a simple language based o ... 



Results 1 - 20 of 200 Result page: 1 2 3 4 5 6 7 8 9 10 next 

The ACM Portal is published by the Association for Computing Machinery. Copyright © 2005 ACM, Inc. 

Terms of Usags Privacy Policy Code of Bh:cs Contact Us 



http://portal.acm.org/resultsxfa 1 ... 3/1 8/05 



'* Results (page 1): solver process and data stores and data structures and assignment and fir.,. Page 6 of 6 



Useful downloads: t H Adobe Acrobat fel QuickTime H i Windows Media Player Real PUmsr 



http://portal.acm.org/resu^ 3/18/05 



* Results (page 1): solver process and data stores and first metric and second metric 



Page 1 of 6 



IIP 



US Potent & Trademark Office 



Subscribe (Full Service) Reflex (Limited Service, Free) Login 

Search: <$ The ACM Digital Library C The Guide 
[solver process and data stores and first metric and second mei 



8" Feedback Report a„firp We fn Sstisfaction 
survey. 



Terms used 

solver process and data stores and first metric and second metric 



Sort results 
by 

Display 
results 



relevance 



! Save results to a Binder 



m 



Search Tips 
expanded form WZ j—. _ . 

L.- Open results in a new 

window 



Results 1 - 20 of 200 

Best 200 shown 



Result page: 12 3 4 



Found 102,185 of 151,219 

Try an Adyaaced.Sea.rch. 

Try this search in The ACM Guide 



2 8 9 10 next 

Relevance scale □ Q B I 



1 Fast detection of communication patterns in distributed executions 
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November 1997 Proceedings of the 1997 conference of the Centre for Advanced Studies 
on Collaborative research 

Full text available: 'HI odf(4,.21. MB) Additional Information: MLc[tation, sbstrsct, M§Ml£.es, index tern^s 

Understanding distributed applications is a tedious and difficult task. Visualizations based on 
process-time diagrams are often used to obtain a better understanding of the execution of 
the application. The visualization tool we use is Poet, an event tracer developed at the 
University of Waterloo. However, these diagrams are often very complex and do not provide 
the user with the desired overview of the application. In our experience, such tools display 
repeated occurrences of non-trivial commun ... 
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MS Manners is a mechanism that employs progress-based regulation to prevent resource 
contention with low-importance processes from degrading the performance of high- 
importance processes. The mechanism assumes that resource contention that degrades the 
performance of a high-importance process will also retard the progress of the low- 
importance process. MS Manners detects this contention by monitoring the progress of the 
low-importance process and inferring resource contention from a drop in the p ... 

Keywords: process priority, progress-based feedback, symmetric resource contention 



3 HPFBench: a high performance Fortran benchmark suite 
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March 2000 ACM Transactions on Mathematical Software (TOMS), volume 26 issue i 
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The high performance Fortran (HPF) benchmark suite HPFBench is designed for evaluating 
the HPF language and compilers on scalable architectures. The functionality of the 
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benchmarks covers scientific software library functions and application kernels that reflect 
the computational structure and communication patterns in fluid dynamic simulations, 
fundamental physics, and molecular studies in chemistry and biology. The benchmarks are 
characterized in terms of FLOP count, memory usage, communi ... 

Keywords: benchmarks, compilers, high performance Fortran 
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Dong H. Ahn, Jeffrey S. Vetter 

November 2002 Proceedings of the 2002 ACM/IEEE conference on Supercomputing 

Full text available: ^pdfn.G6Jy1Bi Additional Information: full. .citation, abstract, references, index terryis 

Contemporary microprocessors provide a rich set of integrated performance counters that 
allow application developers and system architects alike the opportunity to gather important 
information about workload behaviors. Current techniques for analyzing data produced from 
these counters use raw counts, ratios, and visualization techniques help users make 
decisions about their application performance. While these techniques are appropriate for 
analyzing data from one process, they do not scale easi ... 

5 Minerva: An automated resource provisioning tool for large-scale storage systems 
Guillermo A. Alvarez, Elizabeth Borowsky, Susie Go, Theodore H. Romer, Ralph Becker- 
Szendy, Richard Golding, Arif Merchant, Mirjana Spasojevic, Alistair Veitch, John Wilkes 
November 2001 ACM Transactions on Computer Systems (TOCS), volume 19 issue 4 

Full text available: ^ ,g^f(701,,96„KB). Additional Information: tUSj.citat.ion, abstract, retereQC.es., index te.Qis 

Enterprise-scale storage systems, which can contain hundreds of host computers and 
storage devices and up to tens of thousands of disks and logical volumes, are difficult to 
design. The volume of choices that need to be made is massive, and many choices have 
unforeseen interactions. Storage system design is tedious and complicated to do by hand, 
usually leading to solutions that are grossly over-provisioned, substantially under- 
performing or, in the worst case, both.To solve the configuration ni ... 

Keywords: Disk array, RAID, automatic design 
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Recent work has demonstrated the effectiveness of the wavelet decomposition in reducing 
large amounts of data to compact sets of wavelet coefficients (termed "wavelet synopses") 
that can be used to provide fast and reasonably accurate approximate query answers. A 
major shortcoming of these existing wavelet techniques is that the quality of the 
approximate answers they provide varies widely, even for identical queries on nearly 
identical values in distinct parts of the data. As a result, users ha ... 

Keywords: Wavelets, approximate query processing, data synopses, randomized rounding 
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8 Parallel execution of prolog programs: a survey 
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July 2001 ACM Transactions on Programming Languages and Systems (TOPLAS), 

Volume 23 Issue 4 
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^ terms 

Since the early days of logic programming, researchers in the field realized the potential for 
exploitation of parallelism present in the execution of logic programs. Their high-level 
nature, the presence of nondeterminism, and their referential transparency, among other 
characteristics, make logic programs interesting candidates for obtaining speedups through 
parallel execution. At the same time, the fact that the typical applications of logic 
programming frequently involve irregular computatio ... 

Keywords: Automatic parallelization, constraint programming, logic programming, 
parallelism, prolog 
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March 2003 Proceedings of the international symposium on Code generation and 
optimization: feedback-directed and runtime optimization 
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In this paper, we present METRIC, an environment for determining memory inefficiencies by 
examining data traces. METRIC is designed to alter the performance behavior of applications 
that are mostly constrained by their latency to resolve memory references. We make four 
primary contributions in this paper. First, we present methods to extract partial data traces 
from running applications by observing their memory behavior via dynamic binary rewriting. 
Second, we present a methodology to represent ... 
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Jonathan E. Cook, Alexander L. Wolf 

April 1999 ACM Transactions on Software Engineering and Methodology (TOSEM), 

Volume 8 Issue 2 

Additional Information: full citation , abstract, references , citings . ;n<te::< 



Full text available: ffapdf^oT.GQ KB ■ 
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To a great extent, the usefulness of a formal model of a software process lies in its ability to 
accurately predict the behavior of the executing process. Similarly, the usefulness of an 
executing process lies largely in its ability to fulfill the requirements embodied in a formal 
model of the process. When process models and process executions diverge, something 
significant is happening. We have developed techniques for uncovering and measuring the 
discrepancies between models and executio ... 

Keywords: balboa, process validation, software process, tools 
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Performance assertion checking is an approach to automating the testing of performance 
properties of complex systems. System designers write assertions that capture expectations 
for performance; these assertions are checked automatically against monitoring data to 
detect potential performance bugs. Automatically checking expectations allows a designer to 
test a wide range of performance properties as a system evolves: data that meets 
expectations can be discarded automatically, focusing a ... 
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Abhinandan Das, Johannes Gehrke, Mirek Riedewald 

June 2003 Proceedings of the 2003 ACM SIGMOD international conference on 
Management of data 

Additional Information: ;-;H Citation , abstract, references , ciiings. xxfax 



Full text available: W ocif[282.Q7 KB) 

^ terms 

We consider the problem of approximating sliding window joins over data streams in a data 
stream processing system with limited resources. In our model, we deal with resource 
constraints by shedding load in the form of dropping tuples from the data streams. We first 
discuss alternate architectural models for data stream join processing, and we survey 
suitable measures for the quality of an approximation of a set-valued query result. We then 
consider the number of generated result tuples as the q ... 
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Xiaoye S. Li, James W. Demmel 

June 2003 ACM Transactions on Mathematical Software (TOMS), volume 29 issue 2 
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Full text available: safio^. 03 KB) 
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We present the main algorithmic features in the software package SuperLU_DIST, a 
distributed-memory sparse direct solver for large sets of linear equations. We give in detail 
our parallelization strategies, with a focus on scalability issues, and demonstrate the 
software's parallel performance and scalability on current machines. The solver is based on 
sparse Gaussian elimination, with an innovative static pivoting strategy proposed earlier by 
the authors. The main advantage of static pivoting o ... 

Keywords: Sparse direct solver, distributed-memory computers, parallelism, scalability, 
supernodal factorization 
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This paper presents the design and implementation of the Intentional Naming System (INS), 
a resource discovery and service location system for dynamic and mobile networks of 
devices and computers. Such environments require a naming system that is (i) expressive, 
to describe and make requests based on specific properties of services, (ii) responsive, to 
track changes due to mobility and performance, (iii) robust, to handle failures, and (iv) 
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easily configurable. INS uses a simple language based o ... 
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Recent work has demonstrated the effectiveness of the wavelet decomposition in reducing 
large amounts of data to compact sets of wavelet coefficients (termed "wavelet synopses") 
that can be used to provide fast and reasonably accurate approximate answers to queries. A 
major criticism of such techniques is that unlike, for example, random sampling, 
conventional wavelet synopses do not provide informative error guarantees on the accuracy 
of individual approximate answers. In fact, as this paper de ... 
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May 1996 Proceedings of the fourth workshop on I/O in parallel and distributed 
systems: part of the federated computing research conference 
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" terms, review 

To achieve high performance, contemporary computer systems rely on two forms of 
parallelism: instruction-level parallelism (ILP) and thread-level parallelism (TLP). Wide-issue 
super-scalar processors exploit ILP by executing multiple instructions from a single program 
in a single cycle. Multiprocessors (MP) exploit TLP by executing different threads in parallel 
on different processors. Unfortunately, both parallel processing styles statically partition 
processor resources, thus preventing t ... 

Keywords: cache interference, instruction-level parallelism, multiprocessors, 
multithreading, simultaneous multithreading, thread-level parallelism 
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One means of analyzing program performance is by deriving closed-form expressions for 
their execution behavior. This paper discusses the mechanization of such analysis, and 
describes a system, Metric, which is able to analyze simple Lisp programs and produce, for 
example, closed-form expressions for their running time expressed in terms of size of input. 
This paper presents the reasons for mechanizing program analysis, describes the operation 
of Metric, explains its implementation, and disc ... 

Keywords: Lisp, algebraic manipulation, analysis of algorithms, analysis of programs, 
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We describe an approach to text classification that represents a compromise between 
traditional word-based techniques and in-depth natural language processing. Our approach 
uses a natural language processing task called "information extraction" as a basis for high- 
precision text classification. We present three algorithms that use varying amounts of 
extracted information to classify texts. The relevancy signatures algorithm uses linguistic 
phrases; the a ... 

Keywords: information extraction, text classification 
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The intelligent diagram is a recent metaphor for diagramming in which the underlying 
graphic editor parses the diagram as it is being constructed, performing error correction and 
collecting geometric constraints that capture the relationships between diagram 
components. During diagram manipulation a constraint solver uses these geometric 
constraints to maintain the diagram's semantics. We introduce the Penguins system. This 
automates the development of graphical editors that support the i ... 

Keywords: Constraint multi-set grammars, constraint solving, diagram interaction, diagram 
parsing, intelligent diagram, pen-based computing 
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November 2001 ACM Transactions on Computer Systems (TOCS), Volume 19 issue 4 
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Enterprise-scale storage systems, which can contain hundreds of host computers and 
storage devices and up to tens of thousands of disks and logical volumes, are difficult to 
design. The volume of choices that need to be made is massive, and many choices have 
unforeseen interactions. Storage system design is tedious and complicated to do by hand, 
usually leading to solutions that are grossly over-provisioned, substantially under- 
performing or, in the worst case, both.To solve the configuration ni ... 
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Understanding distributed applications is a tedious and difficult task. Visualizations based on 
process-time diagrams are often used to obtain a better understanding of the execution of 
the application. The visualization tool we use is Poet, an event tracer developed at the 
University of Waterloo. However, these diagrams are often very complex and do not provide 
the user with the desired overview of the application. In our experience, such tools display 
repeated occurrences of non-trivial commun ... 
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This paper presents a constructive algorithm for memory-aware task assignment and 
scheduling, which is a part of the prototype system MATAS. The algorithm is well suited for 
image and video processing applications which have hard memory constraints as well as 
constraints on cost, execution time, and resource usage. Our algorithm takes into account 
code and data memory constraints together with the other constraints. It can create 
pipelined implementations. The algorithm finds a task assignmen ... 

Keywords: constraint programming, memory constraints, task assignment, task scheduling 
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The need to solve large sparse linear systems of equations efficiently lies at the heart of 
many applications in computational science and engineering. For very large systems when 
using direct factorization methods of solution, it can be beneficial and sometimes necessary 
to use multiple processors, because of increased memory availability as well as reduced 
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factorization time. We report on the development of a new parallel code that is designed to 
solve linear systems with a highly unsymmetric ... 

Keywords: Gaussian elimination, Sparse matrices, highly unsymmetric linear systems, 
parallel processing 
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Low power has emerged as a principal theme in today's electronics industry. The need for 
low power has caused a major paradigm shift in which power dissipation is as important as 
performance and area. This article presents an in-depth survey of CAD methodologies and 
techniques for designing low power digital CMOS circuits and systems and describes the 
many issues facing designers at architectural, logical, and physical levels of design 
abstraction. It reviews some of the techniques and tool ... 

Keywords: CMOS circuits, adiabatic circuits, computer-aided design of VLSI, dynamic 
power dissipation, energy-delay product, gated clocks, layout, low power layout, low power 
synthesis, lower-power design, power analysis and estimation, power management, power 
minimization and management, probabilistic analysis, silicon-on-insulator technology, 
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We compile Nova, a new language designed for writing network processing applications, 
using a back end based on integer-linear programming (ILP) for register allocation, optimal 
bank assignment, and spills. The compiler's optimizer employs CPS as its intermediate 
representation; some of the invariants that this IR guarantees are essential for the 
formulation of a practical ILP model. Appel and George used a similar ILP-based technique 
for the IA32 to decide which variables reside in registers but ... 

Keywords: Intel IXA, bank assignment, code generation, integer linear programming, 
network processors, programming languages, register allocation 
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Distributed operating systems have many aspects in common with centralized ones, but 
they also differ in certain ways. This paper is intended as an introduction to distributed 
operating systems, and especially to current university research about them. After a 
discussion of what constitutes a distributed operating system and how it is distinguished 
from a computer network, various key design issues are discussed. Then several examples 
of current research projects are examined in some detail ... 
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The final report of the Task Force on the Core of Computer Science presents a new 
intellectual framework for the discipline of computing and a new basis for computing 
curricula. This report has been endorsed and approved for release by the ACM Education 
Board. 
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Wireless sensor networks enable dense sensing of the environment, offering unprecedented 
opportunities for observing the physical world. Centralized data collection and analysis 
adversely impact sensor node lifetime. Previous sensor network research has, therefore, 
focused on in network aggregation and query processing, but has done so for applications 
where the features of interest are known a priori. When features are not known a priori, as 
is the case with many scientific applications in dens ... 
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The goals of performance and flexibility are often at odds in the design of network systems. 
The tension is common enough to justify an architectural solution, rather than a set of 
context-specific solutions. The Programmable Protocol Processing Pipeline (P4) design uses 
programmable hardware to selectively accelerate protocol processing functions. A set of 
field-programmable gate arrays (FPGAs) and an associated library of network processing 
modules implemented in hardware are augmented with so ... 

Keywords: FPGA, P4, computer networking, flexibility, hardware, performance, 
programmable logic devices, programmable networks, protocol processing 
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Spreadsheet languages, which include commercial spreadsheets and various research 
systems, have had a substantial impact on end-user computing. Research shows, however, 
that spreadsheets often contain faults; thus, we would like to provide at least some of the 
benefits of formal testing methodologies to the creators of spreadsheets. This article 
presents a testing methodology that adapts data flow adequacy criteria and coverage 
monitoring to the task of testing spreadsheets. To accommodate ... 
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FPGAs are a promising technology for accelerating SAT solvers. Besides their high density, 
fine granularity, and massive parallelism, FPGAs provide the opportunity for run-time 
customization of the hardware based on the given SAT instance. In this article, a parallel 
deduction engine is proposed for backtrack search algorithms. The performance of the 
deduction engine is critical to the overall performance of the algorithm because, for any 
moderate SAT instance, millions of implications are deriv ... 

Keywords: Adaptive computing, Boolean satisfiability, configurable, high performance, 
performance trade-offs, reconfigurable components, reconfigurable computing, 
reconfigurable systems 
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Many computationally intensive problems in engineering and science give rise to the solution 
of large, sparse, linear systems of equations. Fast and efficient methods for their soltion are 
very important because these systems usually occur in the innermost loop of the 
computational scheme. Parallelization is often necessary to achieve an acceptable level of 
performance. This paper presents the design, implementation, and interface of a library of 
Basic Linear Algebra Subroutines for sparse ... 
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